1 Background [WIP]

Work in Progress


This article is currently a work in progress. Feel free to observe while it gets filled out!

2 Data

2.1 Introduction

Data were gathered for the years 2002-2022, accessed on 21 Dec 2023 through 21 Jan 2023, in Excel spreadsheets available from the American Institute of Physics (AIP); via their Roster of Physics Departments with Enrollment and Degree Data. Due to the structure of the data reporting, enrollment data are considered aggregated for the traditional academic year (beginning in the Fall of the previous year and ending in the Spring of the listed year); in addition to an aggregate of all degree recipients, awarded or extant, for the calendar extent (Jan -> Dec) of the listed year.

For example, in the 2015 report, data regarding enrollments are the finalized numbers of the 2014-2015 academic year; plus all conferred Bachelors (BS), MS, and PhD certificates from January through December (inclusive) of the 2015 calendar year.

2.1.1 Data Pipeline Processing

Datasheets were edited at an intermediary step to unify and homogenize data condensation into a tidy data set. Prior to 2017, the Highest Physics Degree Offered field and some associated enrollment data was not available in the survey. In our analysis, data were infilled on the condition that if a MS or PhD degree certificate was conferred during those years, then the corresponding program at the appropriate level must have existed; and if a higher level was available in 2017, then it must have been available in 2015 or 2016. The Notes annotations field was removed from this analysis, as the field appeared consistent across years, and after 2019 split into a separate datasheet of the same annual report- the Notes values and changelog are available at the source data.

Institution-level data were adjusted for varying spellings and canonizations over time (e.g. Appl Phy -> Appl Phys, Coll of -, etc.). For the purposes of this analysis, Institutions that changed from a College, University, or system designation are simply renamed to their name as of 2022, with information lost regarding that the change occurred. For a complete listing of the modifications made to Institution names, consult Lines 169-231 and 269-317 in the process used to incorporate this data. The data pipeline is encoded in a function, process_data(...), demonstrated in the following R script:

#----         Imports        ----#
#
# TODO :> these docs
#
#--------------------------------#
library(tidyverse)
library(readxl)
library(forcats)

#----     process_data(...)  ----#
#
# TODO :> these docs
#
#--------------------------------#
process_data <- \(DATA_DIR) {
  ##########
  ##-----##
  ##----##
  ##---##   Read Data
  ##----##
  ##-----##
  ##########
  data <- map(
    list.files(DATA_DIR, pattern="*.xlsx", full.names=T),
    \(.file) {
      #####
      ##  Excel Parse
      #####
      readxl::read_excel(
        .file,
        sheet = 'data',
        col_types = c(
          'text',    # Institution
          'text',    # State; 2-chr factor-level
          'text',    # Highest Degree Offered; 3-level factor of `BS`, `MS`, or `PhD`
          'text',    # Astro Program; 3-level factor of `combined`, `separate`, or `none`
          'text',    # Notes
          'numeric', # First-Term Introductory Physics Course Enrollments
          'numeric', # First-Term Introductory Physical Science and Astronomy Course Enrollments
          'numeric', # Fall Junior Enrollments
          'numeric', # Fall Senior Enrollments
          'numeric', # Fall Total Graduate Student Enrollments
          'numeric', # Fall Non-US Graduate Student Enrollments
          'numeric', # Fall First-Year Graduate Student Enrollments
          'numeric', # Physics Bachelors
          'numeric', # Exiting Physics Masters
          'numeric'  # Physics PhDs
        ),
        na = c('---', ''),
        .name_repair = \(cols) { # unify column names
          cols |>
            gsub('(Fall [1-2]{1}([0-1]|[8-9]){1}([0-9]){1}[0-9]{1})', 'Fall', x = _, perl=TRUE) |>
            gsub('(^(20[0-9]{2}\\-[0-9]{2})\\s+)|(\\-)', '', x = _, perl=TRUE) |>
            gsub('\\s*(\\w+)\\s+', '\\1_', x = _, perl=TRUE)
            #gsub('(First-Term)', 'FirstTerm', x = _, perl=TRUE)
        }
      ) |>

      #####
      ##  Denote Year
      #####
      mutate(
        Year = parse_number(
          paste0(
            '20',
            gsub(
                paste0(DATA_DIR, "*physrostr{0,1}([0-9]{2}).xlsx$"),
                "\\1",
                .file))),
        .before = Institution
      )
  }) |>

  ##########
  ##-----##
  ##----##
  ##---##   Process Data
  ##----##
  ##-----##
  ##########
  map( \(.tbl) {

    #####
    ##  transform `*_Enrollments` fields, added in (TODO:> ???)
    #####
    if ('Fall_Total_Graduate_Student_Enrollments' %in% names(.tbl) == F)
    { .tbl <- .tbl |> add_column(Fall_Total_Graduate_Student_Enrollments = NA, .name_repair = 'unique') }
    if ('Fall_FirstYear_Graduate_Student_Enrollments' %in% names(.tbl) == F)
    { .tbl <- .tbl |> add_column(Fall_FirstYear_Graduate_Student_Enrollments = NA, .name_repair = 'unique') }
    if ('Fall_Senior_Enrollments' %in% names(.tbl) == F)
    { .tbl <- .tbl |> add_column(Fall_Senior_Enrollments = NA, .name_repair = 'unique') }
    if ('Fall_Junior_Enrollments' %in% names(.tbl) == F)
    { .tbl <- .tbl |> add_column(Fall_Junior_Enrollments = NA, .name_repair = 'unique') }
    if ('Fall_NonUS_Graduate_Student_Enrollments' %in% names(.tbl) == F)
    { .tbl <- .tbl |> add_column(Fall_NonUS_Graduate_Student_Enrollments = NA, .name_repair = 'unique') }
    if ('Highest_Physics_Degree_Offered' %in% names(.tbl) == F)
    { .tbl <- .tbl |> add_column(Highest_Physics_Degree_Offered = NA, .name_repair = 'unique') }

    .tbl |>

    #####
    ##  drop unused columns
    #####
    select(-any_of(c('Notes', 'Highest_Degree_Offered'))) |>

    #####
    ##  transform `Highest Physics Degree Offered`, added in 2017
    #####
    mutate(
      Highest_Physics_Degree_Offered = if_else(
        is.na(Highest_Physics_Degree_Offered),
        if_else(
          is.na(Physics_PhDs),
          if_else(
            is.na(Fall_Total_Graduate_Student_Enrollments),
            'BS',
            'MS'
          ),
          'PhD'
        ),
        Highest_Physics_Degree_Offered
    )) |>
    mutate( Highest_Physics_Degree_Offered = as.factor(Highest_Physics_Degree_Offered) ) |>
    mutate(
      `Highest_Physics_Degree_Offered` = fct_relevel( `Highest_Physics_Degree_Offered`, c('BS','MS','PhD'))
    ) |>

    #####
    ##  transform/convert `Astro Program` into factor
    #####
    mutate(
      Astro_Program = case_when(
          Astro_Program == 'c' ~ 'combined',
          Astro_Program == 's' ~ 'separate'
        )
    ) |>
    mutate(
      Astro_Program = as.factor(Astro_Program)
    ) |>
    mutate(
      `Astro_Program` = fct_relevel( `Astro_Program`, c('no dept.', 'separate', 'combined'))
    ) |>

    #####
    ##  transform State, Year into factors
    #####
    mutate(State = as.factor(State)) |>
    mutate(Year = as.factor(Year)) |>

    #####
    ##  transform `Appl Phy` -> `Appl Phys`
    #####
    mutate(
      Institution = gsub("(\\(Appl Phy\\))", "\\(Appl Phys\\)", Institution)
    ) |>
    #####
    ##  transform Institution name `College` -> `Coll`, drop apostrophe
    #####
    mutate(
      Institution = gsub(
        "((College(s){0,1})(\\s+(of)){0,1}\\s*\\w{0})$", "Coll \\4",
        Institution,
        perl = TRUE
      )
    ) |>
    mutate(
      Institution = Institution |>
        gsub("'", '', x = _) |>
        gsub("*", '', x = _) |>
        trimws(which = "both") |>
        str_squish()
    ) |>
    #####
    ##  transform Institution names for continuity
    #####
    mutate(
      Institution = Institution |>
          gsub("(Coll\\.)", "Coll", x=_) |>
          gsub("(\\*)", "", x=_) |>
          gsub("(Maryland-U of, Coll Park)", "Maryland-U of, College Park", x=_) |>
          gsub("(Minnesota-U of, Minnpls)", "Minnesota-U of, Minnpls/TwinCities", x=_) |>
          gsub("(Minnesota-U of, Twin Cities)", "Minnesota-U of, Minnpls/TwinCities", x=_) |>
          gsub("(Mary Baldwin Coll)", "Mary Baldwin U", x=_) |>
          gsub("(Piedmont Coll)", "Piedmont U", x=_) |>
          gsub("(William & Mary-Coll of)", "William & Mary", x=_) |>
          gsub("(SUNY Coll at Brockport)", "SUNY Brockport", x=_) |>
          gsub("(Notre Dame of MD-Coll of)", "Notre Dame of MD U", x=_) |>
          gsub("(Fresno State U)", "Cal St U-Fresno", x=_) |>
          gsub("(Muskingum Coll)", "Muskingum U", x=_) |>
          gsub("(Central Methodist Coll)", "Central Methodist U", x=_) |>
          gsub("(Indiana U Purdue U-Ft Wayne)", "Purdue U-Ft Wayne", x=_) |>
          gsub("(Purdue U-Calumet)", "Purdue U-Northwest", x=_) |>
          gsub("(Armstrong Atlantic St U)", "Armstrong State U", x=_) |>
          gsub("(Armstrong State U)", "Georgia Southern U", x=_) |>
          gsub("(Lynchburg Coll)", "Lynchburg-U of", x=_) |>
          gsub("(St. John Fisher Coll)", "St. John Fisher U", x=_) |>
          gsub("(Greenville Coll)", "Greenville U", x=_) |>
          gsub("(Bloomsburg U of PA)", "Commonwealth U of PA", x=_) |>
          gsub("(Roberts Wesleyan Coll)", "Roberts Wesleyan U", x = _) |>
          gsub("(Doane Coll)", "Doane U", x = _) |>
          gsub("(Simmons Coll)", "Simmons U", x = _) |>
          gsub("(Thomas More Coll)", "Thomas More U", x = _) |>
          gsub("(Linfield Coll)", "Linfield U", x = _) |>
          gsub("(Dordt Coll)", "Dordt U", x = _) |>
          gsub("(Otterbein Coll)", "Otterbein U", x = _) |>
          gsub("(Messiah Coll)", "Messiah U", x = _) |>
          gsub("(Sacramento State U)", "Cal St U-Sacramento", x = _) |>
          gsub("(Pennsylvania St U-Erie)", "Pennsylvania St Behrend", x = _) |>
          gsub("(New York U, Polytechnic Sch. of Eng.)", "New York U, Tandon Sch. of Engrg.", x = _) |>
          gsub("(Calvin Coll)", "Calvin U", x = _) |>
          gsub("(Augusta State U)", "Augusta U", x = _) |>
          gsub("(Elmhurst Coll)", "Elmhurst U", x = _) |>
          gsub("(Moravian Coll)", "Moravian U", x = _) |>
          gsub("(Augsburg Coll)", "Augsburg U", x = _) |>
          gsub("(Centre Coll of KY)", "Centre Coll", x = _) |>
          gsub("(Humboldt State U)", "Cal St Poly U-Humboldt", x = _) |>
          gsub("(Texas State U-San Marcos)", "Texas State U", x = _) |>
          gsub("(Richard Stockton Coll of NJ)", "Stockton U", x = _) |>
          gsub("(Chatham Coll)", "Chatham U", x = _) |>
          gsub("(The Sciences of Philadelphia-U of)", "The Sciences-U of", x = _) |>
          gsub("(Baldwin-Wallace Coll)", "Baldwin-Wallace U", x = _) |>
          gsub("(St\\. Catherine-Coll of)", "St. Catherine U", x = _) |>
          gsub("(Walla Walla Coll)", "Walla Walla U", x = _) |>
          gsub("(New York U \\(NYU\\))", "New York U, School of Arts & Science", x = _) |>
          gsub("(Engrg\\.g\\.)", "Engrg.", x = _) |>
          gsub("(Whitworth Coll)", "Whitworth U", x = _) |>
          gsub("(King Coll)", "King U", x = _) |>
          gsub("(NJIT/Rutgers U-Newark)", "New Jersey Inst of Tech", x = _) |>
          gsub("(Rutgers U-Newark/NJIT)", "Rutgers U-Newark", x = _) |>
          gsub("(St\\. Peters Coll)", "St. Peters U", x = _) |>
          gsub("(MO-U of, Rolla)", "Missouri U of Sci & Tech", x = _) |>
          gsub("(Metropolitan St Coll of Denver)", "Metropolitan St U of Denver", x = _) |>
          gsub("(Mesa State Coll)", "Colorado Mesa U", x = _) |>
          gsub("(Southern Polytechnic St U)", "Kennesaw State U", x = _) |>
          gsub("(Bridgewater State Coll)", "Bridgewater State U", x = _) |>
          gsub("(W\\. Virginia Wesleyan Coll)", "West Virginia Wesleyan Coll", x = _) |>
          gsub("(Cal Poly St U-San L\\.O\\.)", "Cal Poly St U-San Luis Obispo", x = _) |>
          gsub("(Elon Coll)", "Elon U", x = _) |>
          gsub("(Manchester Coll)", "Manchester U", x = _) |>
          gsub("(Utah Valley State Coll)", "Utah Valley U", x = _) |>
          gsub("(Mount Union Coll)", "Mount Union-U of", x = _) |>
          gsub("(Albertson Coll of Idaho)", "Coll of Idaho", x = _) |>
          gsub("(Cal St U-Hayward)", "Cal St U-East Bay", x = _) |>
          gsub("(Point Loma Nazarene Coll)", "Point Loma Nazarene U", x = _) |>
          gsub("(Colorado St U, Fort Collins)", "Colorado St U-Fort Collins", x = _) |>
          gsub("(Colorado State U)", "Colorado St U-Fort Collins", x = _) |>
          gsub("(Yale U \\(Appl Sci\\))", "Yale U (Appl Phys)", x = _) |>
          gsub("(West Georgia-State U of)", "West Georgia-U of", x = _) |>
          gsub("(North Georgia Coll & St U)", "North Georgia-U of", x = _) |>
          gsub("(Notre Dame-Coll of, MD)", "Notre Dame of MD U", x = _) |>
          gsub("(Cumberland Coll)", "Cumberlands-U of the", x = _) |>
          gsub("(Missouri Southern St Coll)", "Missouri Southern St U", x = _) |>
          gsub("(Missouri St U)", "Missouri State U", x = _) |>
          gsub("(Central Missouri State U)", "Central Missouri-U of", x = _) |>
          gsub("(Southwest Missouri State U)", "Missouri State U", x = _) |>
          gsub("(Georgian Court Coll)", "Georgian Court U", x = _) |>
          gsub("(OK-U of Sci and Arts)", "Sci and Arts of OK-U of", x = _) |>
          gsub("(Southwest Texas St U)", "Texas State U", x = _) |>
          gsub("(Houston-U of-Downtown)", "Houston-U of, Downtown", x = _) |>
          gsub("(Mary Washington Coll)", "Mary Washington-U of", x = _) |>
          gsub("(Virginia Tech)", "Virginia Polytech Inst & St U", x = _) |>
          gsub("(Randolph-Macon Womans Coll)", "Randolph Coll", x = _)
    ) |>
    mutate(
      Institution = case_when(
        Institution == "Augustana Coll" & State == "SD" ~ "Augustana U",
        Institution == "Xavier U" & State == "LA" ~ "Xavier U of Louisiana",
        Institution == "Union Coll" & State == "NY" ~ "Union Coll (NY)",
        Institution == "Union Coll" & State == "NE" ~ "Union Coll (NE)",
        Institution == "Westminster Coll" & State == "PA" ~ "Westminster Coll (PA)",
        Institution == "Westminster Coll" & State == "UT" ~ "Westminster Coll (UT)",
        Institution == "Westminster Coll" & State == "MO" ~ "Westminster Coll (MO)",
        Institution == "St. Thomas-U of" & State == "MN" ~ "St. Thomas-U of (MN)",
        Institution == "St. Thomas-U of" & State == "TX" ~ "St. Thomas-U of (TX)",
        Institution == "Wheaton Coll" & State == "IL" ~ "Wheaton Coll (IL)",
        Institution == "Wheaton Coll" & State == "MA" ~ "Wheaton Coll (MA)",
        Institution == "Embry-Riddle Aeronautical U" & State == "FL" ~ "Embry-Riddle Aeronautical U (FL)",
        Institution == "Embry-Riddle Aeronautical U" & State == "AZ" ~ "Embry-Riddle Aeronautical U (AZ)",
        Institution == "Georgetown U" & State == "KY" ~ "Georgetown Coll",
        Institution == "Lincoln U" & State == "MO" ~ "Lincoln U (MO)",
        Institution == "Lincoln U" & State == "PA" ~ "Lincoln U (PA)",
        Institution == "Bethel Coll" & State == "MN" ~ "Bethel U",
        Institution == "St. Johns U" & State == "MN" ~ "Coll of St. Benedict / St. Johns U",
        Institution == "Loyola Coll" & State == "MD" ~ "Loyola U of MD",
        .default = Institution
      )
    ) |>
    ## remove `TN-U of, Space Inst`, dupl by `TN-U of, Knoxville`, the host Inst.
    #filter(!(Institution == "TN-U of, Space Inst")) |>

    #####
    ## set column order
    #####
    relocate(
      `Year`,
      `Institution`,
      `State`,
      `Highest_Physics_Degree_Offered`,
      `Fall_Total_Graduate_Student_Enrollments`,
      `Physics_PhDs`,
      `Exiting_Physics_Masters`,
      `Fall_FirstYear_Graduate_Student_Enrollments`,
      `Physics_Bachelors`,
      `Fall_Senior_Enrollments`,
      `Fall_Junior_Enrollments`,
      `FirstTerm_Introductory_Physics_Course_Enrollments`,
      `FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments`,
      `Fall_NonUS_Graduate_Student_Enrollments`,
      `Astro_Program`
    ) |>
    select(
      `Year`,
      `Institution`,
      `State`,
      `Highest_Physics_Degree_Offered`,
      `Fall_Total_Graduate_Student_Enrollments`,
      `Physics_PhDs`,
      `Exiting_Physics_Masters`,
      `Fall_FirstYear_Graduate_Student_Enrollments`,
      `Physics_Bachelors`,
      `Fall_Senior_Enrollments`,
      `Fall_Junior_Enrollments`,
      `FirstTerm_Introductory_Physics_Course_Enrollments`,
      `FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments`,
      `Fall_NonUS_Graduate_Student_Enrollments`,
      `Astro_Program`
    )
  }) |>

  ##########
  ##-----##
  ##----##
  ##---##   Save Data
  ##----##
  ##-----##
  ##########
  ## unify observation collection
  list_rbind() |>
  ## group observations of Institution by Year, then by State
  group_by(State, Institution, Year) |>
  ## sort-asc within previous group by quantity
  arrange(Physics_PhDs, .by_group = TRUE)

  ##########
  ##-----##
  ##----##
  ##---##   Targeted Adjustments
  ##----##
  ##-----##
  ##########

  ## `Georgia Southern U`
  gdata <-
    data |>
    filter(Institution == "Georgia Southern U")

  data <- anti_join(data, gdata, by = 'Institution')

  gdata <-
    gdata |>
    ungroup() |>
    group_by(Year) |>
    summarise(
      `Year` = Year,
      `Institution` = Institution,
      `State` = State,
      `Highest_Physics_Degree_Offered` = Highest_Physics_Degree_Offered,
      `Fall_Total_Graduate_Student_Enrollments` = sum(Fall_Total_Graduate_Student_Enrollments),
      `Physics_PhDs` = sum(Physics_PhDs),
      `Exiting_Physics_Masters` = sum(Exiting_Physics_Masters),
      `Fall_FirstYear_Graduate_Student_Enrollments` = sum(Fall_FirstYear_Graduate_Student_Enrollments),
      `Physics_Bachelors` = sum(Physics_Bachelors),
      `Fall_Senior_Enrollments` = sum(Fall_Senior_Enrollments),
      `Fall_Junior_Enrollments` = sum(Fall_Junior_Enrollments),
      `FirstTerm_Introductory_Physics_Course_Enrollments` = sum(FirstTerm_Introductory_Physics_Course_Enrollments),
      `FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments` = sum(FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments),
      `Fall_NonUS_Graduate_Student_Enrollments` = sum(Fall_NonUS_Graduate_Student_Enrollments),
      `Astro_Program` = Astro_Program
    ) |>
    distinct()

  data <- data |> ungroup()
  data <- full_join(data, gdata)

  ## `UT-Brownsville` + `UT-PanAm` -> `UT-RioGrandeValley`
  gdata <-
    data |>
    filter(Institution %in% c(
      "Texas-U of, at Brownsville",
      "Texas-U of, Pan American",
      "Texas-U of, Rio Grande Valley"
    ))

  data <- anti_join(data, gdata, by = 'Institution')

  gdata <-
    gdata |>
    ungroup() |>
    mutate(
      Institution = "Texas-U of, Rio Grande Valley",
      `Highest_Physics_Degree_Offered` = case_when(
       `Highest_Physics_Degree_Offered` == 'BS' ~ 'MS',
       .default = `Highest_Physics_Degree_Offered`
      )
    ) |>
    group_by(Year) |>
    summarise(
      `Year` = Year,
      `Institution` = Institution,
      `State` = State,
      `Highest_Physics_Degree_Offered` = Highest_Physics_Degree_Offered,
      `Fall_Total_Graduate_Student_Enrollments` = sum(Fall_Total_Graduate_Student_Enrollments),
      `Physics_PhDs` = sum(Physics_PhDs),
      `Exiting_Physics_Masters` = sum(Exiting_Physics_Masters),
      `Fall_FirstYear_Graduate_Student_Enrollments` = sum(Fall_FirstYear_Graduate_Student_Enrollments),
      `Physics_Bachelors` = sum(Physics_Bachelors),
      `Fall_Senior_Enrollments` = sum(Fall_Senior_Enrollments),
      `Fall_Junior_Enrollments` = sum(Fall_Junior_Enrollments),
      `FirstTerm_Introductory_Physics_Course_Enrollments` = sum(FirstTerm_Introductory_Physics_Course_Enrollments),
      `FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments` = sum(FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments),
      `Fall_NonUS_Graduate_Student_Enrollments` = sum(Fall_NonUS_Graduate_Student_Enrollments),
      `Astro_Program` = Astro_Program
    ) |>
    distinct()

  data <- data |> ungroup()
  data <- full_join(data, gdata)

  ## `Mansfield U` + `Lock Haven U` + `Bloomsburg U` -> `Commonwealth U of PA`
  gdata <-
    data |>
    filter(Institution %in% c(
      "Mansfield U",
      "Lock Haven U",
      "Bloomsburg U",
      "Commonwealth U of PA"
    ))

  data <- anti_join(data, gdata, by = 'Institution')

  gdata <-
    gdata |>
    ungroup() |>
    mutate(
      Institution = "Commonwealth U of PA",
    ) |>
    group_by(Year) |>
    summarise(
      `Year` = Year,
      `Institution` = Institution,
      `State` = State,
      `Highest_Physics_Degree_Offered` = Highest_Physics_Degree_Offered,
      `Fall_Total_Graduate_Student_Enrollments` = sum(Fall_Total_Graduate_Student_Enrollments),
      `Physics_PhDs` = sum(Physics_PhDs),
      `Exiting_Physics_Masters` = sum(Exiting_Physics_Masters),
      `Fall_FirstYear_Graduate_Student_Enrollments` = sum(Fall_FirstYear_Graduate_Student_Enrollments),
      `Physics_Bachelors` = sum(Physics_Bachelors),
      `Fall_Senior_Enrollments` = sum(Fall_Senior_Enrollments),
      `Fall_Junior_Enrollments` = sum(Fall_Junior_Enrollments),
      `FirstTerm_Introductory_Physics_Course_Enrollments` = sum(FirstTerm_Introductory_Physics_Course_Enrollments),
      `FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments` = sum(FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments),
      `Fall_NonUS_Graduate_Student_Enrollments` = sum(Fall_NonUS_Graduate_Student_Enrollments),
      `Astro_Program` = Astro_Program
    ) |>
    distinct()

  data <- data |> ungroup()
  data <- full_join(data, gdata)
  ## Group and Sort
  data <- data |>
    #mutate(idx = row_number()) |>
    ## group observations of Institution by Year, then by State
    group_by(State, Institution, Year) |>
    ## sort-asc within previous group by quantity
    arrange(Physics_PhDs, .by_group = TRUE)

  ##########
  ##-----##
  ##----##
  ##---##   Return
  ##----##
  ##-----##
  ##########
  data
}

2.2 Table

Columns have summary metrics generated for groupings of observations, given an Institution over a set of Years. These have been marked as Accumulative or Averaged. Data at the individual observation level (foreach Institution, foreach Year) have been altered from their original values - incl. Georgia Southern U, 2, 3, TODO which combined with another institution in 2018.


library(tidyverse)
library(reactable)
library(ggplot2)
library(ggsci)
library(ggformula)
library(plotly)
library(htmltools)
library(crosstalk)

DATA_DIR = './data/xlsx_edit/'
source('./process_data.R', local = knitr::knit_global())

##########
##-----##
##----##
##---##   Data Table
##----##
##-----##
##########
rosterdata <- process_data(DATA_DIR);

#####
## Build sharedData instances with enumerated `row.names` (keys); filter and dispatch auto-linked
## SharedData s.t. selections can be made across table and graphs while allowing interpolated spline linegraphs
#####
tdata <- 
  highlight_key(rosterdata, ~Institution, group = 'linegrph')

d = list(
  `Physics_PhDs` = highlight_key(
    rosterdata |>
      filter(!is.na(Physics_PhDs)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(Physics_PhDs)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  ),
  `Exiting_Physics_Masters` = highlight_key(
    rosterdata |>
      filter(!is.na(Exiting_Physics_Masters)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(`Exiting_Physics_Masters`)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  ),
  `Fall_FirstYear_Graduate_Student_Enrollments` = highlight_key(
    rosterdata |>
      filter(!is.na(Fall_FirstYear_Graduate_Student_Enrollments)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(`Fall_FirstYear_Graduate_Student_Enrollments`)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  ),
  `Fall_NonUS_Graduate_Student_Enrollments` = highlight_key(
    rosterdata |>
      filter(!is.na(Fall_NonUS_Graduate_Student_Enrollments)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(`Fall_NonUS_Graduate_Student_Enrollments`)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  ),
  `Fall_Total_Graduate_Student_Enrollments` = highlight_key(
    rosterdata |>
      filter(!is.na(Fall_Total_Graduate_Student_Enrollments)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(`Fall_Total_Graduate_Student_Enrollments`)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  ),
  `Physics_Bachelors` = highlight_key(
    rosterdata |>
      filter(!is.na(Physics_Bachelors)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(`Physics_Bachelors`)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  ),
  `Fall_Senior_Enrollments` = highlight_key(
    rosterdata |>
      filter(!is.na(Fall_Senior_Enrollments)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(`Fall_Senior_Enrollments`)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  ),
  `Fall_Junior_Enrollments` = highlight_key(
    rosterdata |>
      filter(!is.na(Fall_Junior_Enrollments)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(`Fall_Junior_Enrollments`)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  ),
  `FirstTerm_Introductory_Physics_Course_Enrollments` = highlight_key(
    rosterdata |>
      filter(!is.na(FirstTerm_Introductory_Physics_Course_Enrollments)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(`FirstTerm_Introductory_Physics_Course_Enrollments`)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  ),
  `FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments` = highlight_key(
    rosterdata |>
      filter(!is.na(FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments)) |>
      (\(.tbl) {
        .holdout <-
          .tbl |>
          filter(!is.na(`FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments`)) |>
          group_by(Institution) |>
          mutate(count = n()) |>
          filter(count < 4) |>
          select(Institution) |>
          unique() |>
          deframe()
    
        .holdout_obs <-
          .tbl |>
          filter(Institution %in% .holdout)
    
        .rest <- anti_join(.tbl, .holdout_obs)
    
        .rest
      })(),
    key = ~Institution,
    group = 'linegrph'
  )
)

#####
##  Generate Reactable data table
#####
rxtbl <- tdata |>
    reactable(
        ####
        ##  main table
        ####
        elementId = 'rosterphys02_22-tbl',
        filterable = T,
        searchable = T,
        groupBy = c('Institution'),
        bordered = T,
        #striped = T,
        highlight = T,
        compact = T,
        fullWidth = T, 
        pagination = F,
        showPageSizeOptions = T,
        pageSizeOptions = c(6, 12, 18),
        paginationType = "jump",
        selection = "multiple",
        #onClick = "select",
        height = 768,
        rowStyle = JS("
          function(rowInfo, state) {
            if (!rowInfo) return;
          
            // style nested rows
            if (rowInfo.level > 0) {
              return { background: '#eee', borderLeft: '2px solid #ffa62d' }
            } else {
              return { borderLeft: '2px solid transparent' }
            }
        
          }
        "),
        #details = function(idx) {
        #  d <- rosterdata |> filter(Institution == rosterdata$Institution[idx])
        #  htmltools::div(style = "padding: 1rem",
        #    reactable(
        #      d, 
        #      outlined = TRUE
        #    )
        #  )
        #},
        defaultColDef = colDef(
            header = function(value) gsub(".", " ", value, fixed = TRUE),
            headerClass = "sticky tbl-header",
            cell = function(value) format(value, nsmall = 1),
            filterable = F,
            align = "center",
            minWidth = 120,
            headerStyle = list(background = "#f7f7f8"),
            vAlign = 'center',
            headerVAlign = 'bottom',
            format = colFormat(
                separators = T,
                digits = 0
            )
        ),
        defaultSorted = list(
            Year = 'asc',
            Physics_PhDs = 'desc',
            Physics_Bachelors = 'desc'
        ),
        defaultPageSize = 6,
        minRows = 4,
        ####
        ##  Per-Column defns
        ####
        columns = list(
          `Year` = colDef(
            name = 'Year',
            align = "center",
            minWidth = 64,
            sticky = "left",
            sortable = T,
            defaultSortOrder = "asc"
          ),
          `Institution` = colDef(
            name = 'Institution',
            align = 'left',
            minWidth = 240,
            sticky = "left",
            filterable = T
          ),
          `State` = colDef(
            name = 'State',
            minWidth = 64,
            aggregate = 'unique',
            filterable = T
          ),
          `Highest_Physics_Degree_Offered` = colDef(
            name = "Highest Physics Degree Offered",
            aggregate = "unique",
          ),
          `Fall_Total_Graduate_Student_Enrollments` = colDef(
            name = "Total Grad Student Enrollment (Fall)\n (Averaged)",
            aggregate = "mean",
          ),
          `Physics_PhDs` = colDef(
            name = "Physics PhDs\n (Accum)",
            aggregate = "sum",
            format = colFormat(digits=0)
          ),
          `Exiting_Physics_Masters` = colDef(
            name = "Exiting Physics Masters\n (Accum)",
            aggregate = "sum",
            format = colFormat(digits=0)
          ),
          `Fall_FirstYear_Graduate_Student_Enrollments` = colDef(
            name = "First-Year Grad Student Enrollment (Fall)\n (Averaged)",
            aggregate = "mean",
          ),
          `Physics_Bachelors` = colDef(
            name = "Physics Bachelors\n (Accum)",
            aggregate = "sum",
            format = colFormat(digits=0)
          ),
          `Fall_Senior_Enrollments` = colDef(
            name = "Senior Enrollment (Fall)\n (Averaged)",
            aggregate = "mean",
          ),
          `Fall_Junior_Enrollments` = colDef(
            name = "Junior Enrollment (Fall)\n (Averaged)",
            aggregate = "mean",
          ),
          `FirstTerm_Introductory_Physics_Course_Enrollments` = colDef(
            name = "First-Term Intro Physics Course Enrollment\n (Averaged)",
            aggregate = "mean",
          ),
          `FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments` = colDef(
            name = "First-Term Intro Phys. Sci. and Astro. Course Enrollment\n (Averaged)",
            aggregate = "mean",
          ),
          `Fall_NonUS_Graduate_Student_Enrollments` = colDef(
            name = "Non-US Grad Student Enrollment (Fall)\n (Averaged)",
            aggregate = "mean",
          ),
          `Astro_Program` = colDef(
            name = "Astro Program?",
            minWidth = 80,
            align = "right",
            aggregate = JS("
              function(values, rows) {
                return(values
                        .filter((v) => v == 'combined' || v == 'separate')
                        .map((v) => v == 'combined' ? 'c' : 's')
                        .reduce(function (acc, curr) {
                            if (!acc.includes(curr))
                                acc.push(curr);
                            return acc;
                        }, [])
                        .join(', '));
              }
            ")
          )
        )
    )

#####
##  Display Shiny with Crosstalk/htmltools widgets
#####
shiny::fluidPage(
  shiny::fluidRow(
    shiny::column(
      2,
      htmltools::browsable(
        tagList(
          tags$div(
            class = 'd-grid gap-2 mx-auto',
            # CSV download button
            tags$button(
              tagList(fontawesome::fa("download"), "\tCSV Data"),
              class = 'btn btn-outline-success',
              onclick = "Reactable.downloadDataCSV('rosterphys02_22-tbl', 'rosterphys02_22.csv')"
            ),
            # Expand/Collapse button
            tags$button(
              "Expand/Collapse\nRows",
              class = 'btn btn-info',
              onclick = "Reactable.toggleAllRowsExpanded('rosterphys02_22-tbl')",
            ),
          )
        )
      ),
      # filter-by degree
      #filter_checkbox("degree", "Degree Level", tdata, ~Highest_Physics_Degree_Offered),
      # TODO :> sticky column toggle button
      # Group Selection
      htmltools::browsable(
        tagList(
          tags$div(
            class = "p-2",
            tags$b(
              tags$label("Group By", `for` = "rosterphys02_22-grp_select"),
            ),
            tags$ul(
              id = "rosterphys02_22-grp_select",
              class = "list-group",
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  value = "State",
                  id = "grp_select-state",
                  onchange = "Reactable.setGroupBy(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-grp_select li input[type=checkbox]')]
                    .map((e) => e.checked ? e.value : false)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "State",
                  class = "form-check-label",
                  `for` = "grp_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  value = "Highest_Physics_Degree_Offered",
                  id = "grp_select-program",
                  onchange = "Reactable.setGroupBy(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-grp_select li input[type=checkbox]')]
                    .map((e) => e.checked ? e.value : false)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "Program",
                  class = "form-check-label",
                  `for` = "grp_select-program",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  value = "Year",
                  id = "grp_select-year",
                  onchange = "Reactable.setGroupBy(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-grp_select li input[type=checkbox]')]
                    .map((e) => e.checked ? e.value : false)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "Year",
                  class = "form-check-label",
                  `for` = "grp_select-year",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  value = "Institution",
                  id = "grp_select-inst",
                  checked = NA,
                  onchange = "Reactable.setGroupBy(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-grp_select li input[type=checkbox]')]
                    .map((e) => e.checked ? e.value : false)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "Institution",
                  class = "form-check-label",
                  `for` = "grp_select-inst",
                )
              ),
            )
          ),
          # column visibility selection
          tags$div(
            class = "p-2",
            tags$b(
              tags$label("Visible Columns", `for` = "rosterphys02_22-vis_select"),
            ),
            tags$ul(
              id = "rosterphys02_22-vis_select",
              class = "list-group",
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "Fall_Total_Graduate_Student_Enrollments",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "Total Grad Stdnts",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "Physics_PhDs",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "PhDs",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "Exiting_Physics_Masters",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "MS",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "Fall_FirstYear_Graduate_Student_Enrollments",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "FirstYear Grad Stdnts",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "Physics_Bachelors",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "BS",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "Fall_Senior_Enrollments",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "Seniors",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "Fall_Junior_Enrollments",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "Juniors",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "FirstTerm_Introductory_Physics_Course_Enrollments",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "Intro Phys",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "PhySci+Astro",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "Fall_NonUS_Graduate_Student_Enrollments",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "NonUS Grad Stdnts",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
              tags$li(
                class = "list-group-item",
                tags$input(
                  class = "form-check-input me-1",
                  type = "checkbox",
                  checked = NA,
                  value = "Astro_Program",
                  id = "vis_select-state",
                  onchange = "Reactable.setHiddenColumns(
                    'rosterphys02_22-tbl',
                    [...document.querySelectorAll('#rosterphys02_22-vis_select li input[type=checkbox]')]
                    .map((e) => e.checked ? false : e.value)
                    .filter((w) => w !== false)
                  )"
                ),
                tags$label(
                  "Astro Dept",
                  class = "form-check-label",
                  `for` = "vis_select-state",
                )
              ),
            )
          ),
        )
      ),
    ),
    shiny::column(
      10,
      # DataTable
      rxtbl,
      # TODO :> add line-graph over time here using selected data from table
      ## Prints all overlaid, 
      htmltools::browsable(
        tagList(
          tags$div(
            id = 'linegrphs-all',
            tags$nav(
              tags$div(
                class = "nav nav-tabs",
                role = "tablist",
                id = "linegrphs-tabs",
                
                # PHDs
                tags$button(
                  class = 'nav-link active',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'true',
                  `data-bs-target` = '#linegrph_physicsphd-tabcontent',
                  `aria-controls` = 'linegrph_physicsphd-tabcontent',
                  id = 'linegraph_physicsphd-tab',
                  "PhDs"
                ),
                
                # Masters
                tags$button(
                  class = 'nav-link',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'false',
                  `data-bs-target` = '#linegrph_masters-tabcontent',
                  `aria-controls` = 'linegrph_masters-tabcontent',
                  id = 'linegraph_masters-tab',
                  "Masters"
                ),
                
                # Total Grad Stdnts
                tags$button(
                  class = 'nav-link',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'false',
                  `data-bs-target` = '#linegrph_totalgrads-tabcontent',
                  `aria-controls` = 'linegrph_totalgrads-tabcontent',
                  id = 'linegraph_totalgrads-tab',
                  "Total Grad Stdnts"
                ),
                
                # FirstYear Grad Stdnts
                tags$button(
                  class = 'nav-link',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'false',
                  `data-bs-target` = '#linegrph_fygrads-tabcontent',
                  `aria-controls` = 'linegrph_fygrads-tabcontent',
                  id = 'linegraph_fygrads-tab',
                  "FirstYear Grad Stdnts"
                ),
                
                # NonUS Grad Stdnts
                tags$button(
                  class = 'nav-link',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'false',
                  `data-bs-target` = '#linegrph_nonusgrads-tabcontent',
                  `aria-controls` = 'linegrph_nonusgrads-tabcontent',
                  id = 'linegraph_nonusgrads-tab',
                  "NonUS Grad Stdnts"
                ),
                
                # Bachelors
                tags$button(
                  class = 'nav-link',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'false',
                  `data-bs-target` = '#linegrph_bachelors-tabcontent',
                  `aria-controls` = 'linegrph_bachelors-tabcontent',
                  id = 'linegraph_bachelors-tab',
                  "Bachelors"
                ),
                
                # Seniors
                tags$button(
                  class = 'nav-link',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'false',
                  `data-bs-target` = '#linegrph_seniors-tabcontent',
                  `aria-controls` = 'linegrph_seniors-tabcontent',
                  id = 'linegraph_seniors-tab',
                  "Seniors"
                ),
                
                # Juniors
                tags$button(
                  class = 'nav-link',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'false',
                  `data-bs-target` = '#linegrph_juniors-tabcontent',
                  `aria-controls` = 'linegrph_juniors-tabcontent',
                  id = 'linegraph_juniors-tab',
                  "Juniors"
                ),
                
                # Intro Phys
                tags$button(
                  class = 'nav-link',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'false',
                  `data-bs-target` = '#linegrph_introphys-tabcontent',
                  `aria-controls` = 'linegrph_introphys-tabcontent',
                  id = 'linegraph_introphys-tab',
                  "Intro Phys"
                ),
                
                # PhySci+Astro
                tags$button(
                  class = 'nav-link',
                  `data-bs-toggle` = 'tab',
                  type = 'button',
                  role = 'tab',
                  `aria-selected` = 'false',
                  `data-bs-target` = '#linegrph_physastro-tabcontent',
                  `aria-controls` = 'linegrph_physastro-tabcontent',
                  id = 'linegraph_physastro-tab',
                  "PhySci+Astro"
                ),
                
              ),
            ),
            
            tags$div(
              class = 'tab-content',
              id = 'linegrph_tabscontent',
              
              # PhDs
              tags$div(
                class = 'tab-pane fade show active',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_physicsphd-tab',
                id = 'linegrph_physicsphd-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$Physics_PhDs,
                        aes(
                           x = Year, 
                           y = Physics_PhDs, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "Conferred Physics PhDs by Institution per Year",
                          x = "Year",
                          y = "Physics PhDs"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
              # Masters
              tags$div(
                class = 'tab-pane fade',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_masters-tab',
                id = 'linegrph_masters-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$Exiting_Physics_Masters,
                        aes(
                           x = Year, 
                           y = `Exiting_Physics_Masters`, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "Conferred Exiting Masters by Institution per Year",
                          x = "Year",
                          y = "Exiting Masters"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
              # Total Grad Stdnts
              tags$div(
                class = 'tab-pane fade',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_totalgrads-tab',
                id = 'linegrph_totalgrads-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$Fall_Total_Graduate_Student_Enrollments,
                        aes(
                           x = Year, 
                           y = `Fall_Total_Graduate_Student_Enrollments`, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "Total (Fall) Graduate Student Enrollments by Institution per Year",
                          x = "Year",
                          y = "Total (Fall) Graduate Student Enrollments"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
              # FirstYear Grad Stdnts
              tags$div(
                class = 'tab-pane fade',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_fygrads-tab',
                id = 'linegrph_fygrads-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$Fall_FirstYear_Graduate_Student_Enrollments,
                        aes(
                           x = Year, 
                           y = `Fall_FirstYear_Graduate_Student_Enrollments`, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "(Fall) First-Year Graduate Student Enrollments by Institution per Year",
                          x = "Year",
                          y = "(Fall) First-Year Graduate Student Enrollments"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
              # NonUS Grad Stdnts
              tags$div(
                class = 'tab-pane fade',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_nonusgrads-tab',
                id = 'linegrph_nonusgrads-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$Fall_NonUS_Graduate_Student_Enrollments,
                        aes(
                           x = Year, 
                           y = `Fall_NonUS_Graduate_Student_Enrollments`, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "(Fall) Non-US Graduate Student Enrollments by Institution per Year",
                          x = "Year",
                          y = "(Fall) Non-US Graduate Student Enrollments"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
              # Bachelors
              tags$div(
                class = 'tab-pane fade',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_bachelors-tab',
                id = 'linegrph_bachelors-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$Physics_Bachelors,
                        aes(
                           x = Year, 
                           y = `Physics_Bachelors`, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "Physics Bachelors Conferred by Institution per Year",
                          x = "Year",
                          y = "Physics Bachelors Conferred"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
              # Seniors
              tags$div(
                class = 'tab-pane fade',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_seniors-tab',
                id = 'linegrph_seniors-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$Fall_Senior_Enrollments,
                        aes(
                           x = Year, 
                           y = `Fall_Senior_Enrollments`, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "(Fall) Senior Enrollments by Institution per Year",
                          x = "Year",
                          y = "(Fall) Senior Enrollments"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
              # Juniors
              tags$div(
                class = 'tab-pane fade',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_juniors-tab',
                id = 'linegrph_juniors-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$Fall_Junior_Enrollments,
                        aes(
                           x = Year, 
                           y = `Fall_Junior_Enrollments`, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "(Fall) Junior Enrollments by Institution per Year",
                          x = "Year",
                          y = "(Fall) Junior Enrollments"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
              # Intro Phys
              tags$div(
                class = 'tab-pane fade',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_introphys-tab',
                id = 'linegrph_introphys-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$FirstTerm_Introductory_Physics_Course_Enrollments,
                        aes(
                           x = Year, 
                           y = `FirstTerm_Introductory_Physics_Course_Enrollments`, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "First-Term Intro Physics Course Enrollments by Institution per Year",
                          x = "Year",
                          y = "First-Term Intro Physics Course Enrollments"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
              # PhySci+Astro
              tags$div(
                class = 'tab-pane fade',
                role = 'tabpanel',
                `aria-labelledby` = 'linegrph_physastro-tab',
                id = 'linegrph_physastro-tabcontent',
                
                ggplotly(
                  width = 0.9*1280,
                  height = 0.9*720,
                  p = ggplot(
                        d$FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments,
                        aes(
                           x = Year, 
                           y = `FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments`, 
                           group = Institution, color = Institution
                        )) +
                        geom_point(
                          shape = 19,
                          size = 0.3,
                          alpha = 0.7
                        ) +
                        geom_spline(
                          df = after_stat("count"),
                          df.offset = 20,
                          tol = 0.1,
                          #df = list(6, 3),
                          na.rm = F ## only suppresses the warning on removal
                        ) +
                        labs(
                          title = "First-Term Intro Physical Science and Astronomy Course Enrollments by Institution per Year",
                          x = "Year",
                          y = "First-Term Intro Phys. Sci. and Astro. Course Enrollments"
                        ) +
                        scale_color_manual( 
                          values = rep( 
                            pal_jama()(7),
                            (length(unique(rosterdata$Institution)) / 7) + 7 
                          )
                        )
                ) |> 
                style(
                  showlegend = F, traces = c(1:length(unique(rosterdata$Institutions)))
                ) |>
                highlight( 
                  opacityDim = 0.02,
                  persistent = TRUE,
                  selected = attrs_selected(showlegend = T)
                ) |> toWebGL()
              ),
              
            )
          )
        )
      )
    )
  )
)

2.3 Observations and Insights

Multiple notes arise from a brief overview of this collection:

2.3.1 Summary of Degree Type by Unique Institutions

print('TODO:> .')
## [1] "TODO:> ."
  • A natural boundary to delineate the 825 unique Institutions in this set is in the categorical division of Highest Physics Degree Offered (BS, MS, and PhD) in the most recent year of the roster publication. We will return to this Bayesian decision boundary of the search space later. Furthermore, we will consider PhD-granting Institutions as separate from the combined group of BS/MS-granting Institutions.

2.3.2 PhD Availability by Region

## `Institutions offering a PhD program as of 2022
phds_2022 <- 
  rosterdata |>
  filter(
    (
      Highest_Physics_Degree_Offered == 'PhD' &
      Year == 2022
    )
  ) |>
  ungroup() |>
  select(State, Institution) |>
  distinct() |>
  group_by(State)

# TODO :> Use reactable alongside density heatmap at State/Region level
phds_2022
  • There is at least one Institution in every state and territory, except the US Virgin Islands, that has a Physics PhD program.

  • There are 209 unique Institutions offering a PhD as of 2022.

2.3.3 Temporal Changes of Observations

print('TODO:> .')
## [1] "TODO:> ."
  • The University of Minnesota (MN) - Minneapolis offered a PhD through 2015-2019 (the past extent of the dataset), but is no longer part of the reported data set. However, the University of Minnesota - Twin Cities in 2020-Current, despite having no data prior to 2020, produced similar numbers to the prior Institution. These two should be merged for the purposes of this analysis, as if the Physics department is continuous through these years across both Institutions. Similar behavior occurs in a handful of other institutions.

  • TODO :> Multiple universities also changed their program designation, with some recently adding or dropping their PhD programs, or morphing their MS programs into a PhD.

2.3.4 Applied Physics Departments

  print('TODO:> .')
## [1] "TODO:> ."
  • Applied Physics PhDs are granted by a separate department housed in another College, of Engineering or similar, complementary to an Institution’s existing Physics department. Only TODO Institutions of this type (for our analysis) exist: specifically Stanford, Yale, Harvard, Univ. Of Michigan - Ann Arbor, Columbia, Cornell, and Rice Universities. New York U also has this split between its College of Liberal Arts and Sciences and an Engineering College, but the distinction only appears at the BS/MS level starting in 20??, while Kettering U has an Applied Physics program at the BS level. UC-Davis had an Applied program that stopped reporting in 2010.

2.3.5 Top 10 States by accumulative Physics PhDs

print('TODO:> .')
## [1] "TODO:> ."
  • The Top 10 ranking of cumulative production since 2015 of Physics PhDs is, in descending order: CA (1926), NY (1154), MA (1105), TX (907), IL (700), FL (700), PA (623), OH (599), MI (530), and CO (419). We should ensure we perform a population-corrected analysis of the per-capita rate of completing a PhD in these states. Interestingly, this ranking is disjoint from the same ordering for the production of Physics Bachelors. Yet another ranking is produced when ordered by descending quantity of the three degree levels (BS, MS, and PhD).

2.3.6 Top 20 Institutions by accumulative Physics_PhDs

# TODO :> TOP 20 ranking
  • The Top 20 ranking Institutions of cumulative production over the past decade ordered by PhD, alongside accumulative BS degrees conferred are:
Institution Physics PhDs Physics Bachelors
CA-U of, Berkeley 286 931
IL-U of, Urbana/Champaign 270 1129
Colorado-U of, Boulder 263 663
Harvard U 262 423
Ohio State U 252 586
Maryland-U of, Coll Park 241 576
Stanford U 225 197
Texas-U of, at Austin 186 635
Mass Inst of Tech (MIT) 184 376
Cornell U 182 368
Chicago-U of 181 449
WI-U of, Madison 181 387
Michigan-U of, Ann Arbor 180 439
CA-U of, Los Angeles 170 644
SUNY-Stony Brook 169 423
CA-U of, San Diego 169 405
Texas A&M-College Station 168 265
Washington-U of 163 1286
Minnesota-U of, TwinCities 163 416
Princeton U 161 203

2.3.7 TODO :>

  • TODO :> How many departments inconsistently reported data?

  • TODO :> How many departments ceased reporting per each year, any departments end their PhD or BS program?

  • TODO :> COVID-19 impacts?

  • TODO :> Frequency Distribution of accumulative PhDs

  • TODO :> Top 10 by accumulative Physics Bachelors? Frequency Distribution?

    *   what the f is going on with the higher number of exiting physics masters (like U Washington)?
  • TODO :> Filter by latest year reported + frequency graph/histogram by year, earliest year reported, all NA (e.g., dead departments, no reports, new depts).

    *   inconsistent reporting, `NA` reporting by `Year`
  • TODO :> What is the boundary/difference in Phys bachelors production and undergrad enrollment figures between programs that offer BS/MS as highest vs PhD as highest?

  • TODO :> Use shapefiles to demonstrate gradient of universities and availability of programs

    • Maybe leave this summary graphic until County-level data is established.

2.4 Additional Data

  • is HBCU

  • is historically womens’

  • is private/public

  • is profit/non-profit

  • is private christian

  • is land-grant

  • Faculty count?

  • Institution-level COMMON dataset

  • Voronoi-Cell approximation of nearest PhD or BS

  • Department Specialties? <- n-gram analysis

  • Undergraduate population total enrollments?

  • County of State that Institution is located in

  • Median Salary of surrounding

  • Percentile of Median Salary of county relative to State

  • Percentile median salary of county Relative to Country?

  • Gini Index.

  • Measure of Racial Segregation, Ethnic Diversity metrics?

  • Median Rent of surrounding

  • % vote in last election?

  • Political Party of State governor?

  • Online Program availability?

  • Has nearby or attached National Lab (as.factor(...) with Y/N levels; sep col of assoc. lab)

  • Link to faculty/dept page

  • FAFSA ID/Federal School Code

  • R1 research designation

  • Legal recreational weed

  • no pitbull bans

  • per-capita incidence rate of population that are women between 23-40 with at least a bachelors degree

  • per-capita incidence rate of population that are men between 20-50 with at least a bachelors degree

  • estimate ratio of female/male annual median income


3 Methods and Analysis

3.1 Problem-Domain Adjusted and Summary Metrics

  • TODO :> Ratio of Accumulative PhDs produced over timeframe to average number of graduate students enrolled per year

  • TODO :> Interpolation across NA rows?

3.1.1 Resiliency and Conversion Scores

  • TODO :> Hertzsprung-Russell Diagram here https://observablehq.com/@d3/hertzsprung-russell-diagram?intent=fork

  • PhD resiliency has an open definition, should include the number of produced MS students as well

  • Resiliency is a conversion score, but should compare number of produced combined BS/MS students to sum of prev year’s Senior and Junior declared major enrollments

  • Conversion Score 1; BS: \[ raw\_score = \Big(1 - \frac{N\_Juniors_{t-1}}{N\_Seniors_t} \Big) + \epsilon_t \]

    • Then, compute z-score and student’s T-dist of raw_score to identify deviation from trend line
  • Conversion Score 2 should compare number of produced BS to prev years Senior enrollments

3.1.2 Per-Capita PhDs Conferred vs. Undergrad Physics Enrollment Population

3.1.3 Altered Data


3.2 Bollinger-Bands Volatility Metric

Roughly defined econometric using distance between moving-average upper and lower bounds at sampled time \(t\).

3.3 Time-Series Regression

3.4 XGBoost Random Forest

3.5 Outlier Analysis vs Trend-Line


4 Results


5 Discussion


6 References